Linked Data for Information Extraction Challenge 2014 Tasks and Results

نویسندگان

  • Robert Meusel
  • Heiko Paulheim
چکیده

Abstract. For making the web of linked data grow, information extraction methods are a good alternative to manual dataset curation, since there is an abundance of semi-structured and unstructured information which can be harvested that way. At the same time, existing Linked Data sets can be used for training and evaluating such information extraction systems. In this paper, we introduce the Linked Data for Information Extraction Challenge 2014. Using the example of person data in Microformats, we show how training and testing data can be curated at large scale. Furthermore, we discuss results achieved in the challenge, as well as open problems and future directions for the challenge.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

A Template-Based Information Extraction from Web Sites with Unstable Markup

This paper presents results of a work on crawling CEUR Workshop proceedings web site to a Linked Open Data (LOD) dataset in the framework of ESWC 2014 Semantic Publishing Challenge 2014. Our approach is based on using an extensible template-dependent crawler and DBpedia for linking extracted entities, such as the names of universities and countries.

متن کامل

Unstable markup: A template-based information extraction from web sites with unstable markup

This paper presents results of a work on crawling CEUR Workshop proceedings web site to a Linked Open Data (LOD) dataset in the framework of Semantic Publishing Challenge 2014. Our approach is based on so-called “templates of web site’ blocks“ and DBpedia for crawling and linking extracted entities.

متن کامل

Precise Medication Extraction using Agile Text Mining

Agile text mining is widely used for commercial text mining in the pharmaceutical industry. It can be applied without building an annotated training corpus, so is well-suited to novel or one-off extraction tasks. In this work we wanted to see how efficiently it could be adapted for healthcare extraction tasks such as medication extraction. The aim was to identify medication names, associated do...

متن کامل

Semantic Publishing Challenge - Assessing the Quality of Scientific Output by Information Extraction and Interlinking

The Semantic Publishing Challenge series aims at investigating novel approaches for improving scholarly publishing using Linked Data technology. In 2014 we had bootstrapped this effort with a focus on extracting information from non-semantic publications – computer science workshop proceedings volumes and their papers – to assess their quality. The objective of this second edition was to improv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014